test performance
OntheAccuracyofInfluenceFunctions forMeasuringGroupEffects
Influence functions estimate the effect of removing a training point on a model without theneedtoretrain. Theyarebasedonafirst-order Taylorapproximation thatisguaranteed tobeaccurate forsufficiently small changes tothemodel, and so are commonly used to study the effect of individual points in large datasets. However, we often want to study the effects of largegroups of training points, e.g., todiagnose batch effects orapportion credit between different data sources.
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Singapore (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- North America > United States (0.04)
- (2 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (0.92)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Middle East > Jordan (0.04)
0be50b4590f1c5fdf4c8feddd63c4f67-Supplemental-Datasets_and_Benchmarks.pdf
In Figure 1 we demonstrate the common neighbor (CN) distribution among positive and negative test samples for ogbl-collab, ogbl-ppa, and ogbl-citation2. These results demonstrate that a vast majority of negative samples have no CNs. Since CNs is a typically good heuristic, this makes it easy to identify most negative samples. We further present the CN distribution of Cora, Citeseer, Pubmed, and ogbl-ddi in Figure 3. The CN distribution of Cora, Citeseer, and Pubmed are consistent with our previous observations on the OGB datasets in Figure 1. We note that ogbl-ddi exhibits a different distribution with other datasets. As compared to the other datasets, most of the negative samples in ogbl-ddi have common neighbors. This is likely because ogbl-ddi is considerably denser than the other graphs.